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CHAPTER 


INTRODUCTION 


This document describes an alternate set of instructions that may be used on the VIA C3 proces- 
sor. The alternate instructions are the internal instructions of the VIA C3 processor and provide 
substantial additional function over the x86 instruction set. The VIA C3 Alternate Instruction 
Set Application Note describes how system software can enable these alternate instructions. This 
document is a programming reference describing the encoding and operation of alternate instruc- 
tions. 


1.1 BASIC CONCEPTS 


The VIA C3 processor family is intended as a plug-replaceable, software-compatible alternative to 
the Intel Pentium III processor. Accordingly, the VIA C3 processor normally executes compatible 

instructions. The internal design of the VIA C3 processor, however, is quite different from the 
Pentium III internal design. In particular, the VIA C3 processor comprises two major components: 
a front-end that fetches x86 instruction bytes and translates them from x86 into internal instruc- 
tions, and an internal microprocessor that executes these internal instructions. 


1.2 OVERVIEW OF THIS PROGRAMMING REFERENCE 


This Programming Reference is divided into sections describing internal instructions according to 
the registers used: 


= Chapter 1 - Introduction. Describes the different execution units and programmer's 
model of the registers. 
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= Chapter 2 -Instruction format. Describes the instruction format and bit field defini- 
tions. 


= Chapter 3 - General instructions. These instructions operate on the general x86 regis- 
ters EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI as well as additional temporary 
general registers. 


= Chapter 4 - Floating-point instructions. These instructions operate on the floating- 
point registers as well as additional temporary floating-point registers. 


= Chapter 5 - MMX™ instructions. These instructions operate on the x86 MMX™ 
registers as well as additional temporary MMX™ registers. 


1.3 GENERAL PURPOSE REGISTERS 


These are 32 general purpose registers (GPRs) with similar usage to the x86 GPRs. GPR 0 always 
returns zero and can never be written. GPR 31 has a different special meaning in alternate instruc- 
tion mode. It is the forward path data from the EA unit when referenced on load/store 
instructions (not LEA) as the base. 


The GPRs have the required x86 functionality in that there are instruction controls that can select 
byte-oriented subsets (such as the low byte) of the 32-bit result data to be written into the result 
register. 


The x86 instruction translator and associated microcode use the GPRs to store some x86 architec- 
ture registers such as the x86 GPRs and the x86 selector registers. These registers are directly 
referenced by code generated by the translator; thus, the mapping of EAX etc. into the native 
GPRs is fixed and considered part of the ISA. Other use of the GPRs is known only to the x86 
microcode and thus is not defined as part of the ISA. A table below shows the usage of all GPRs 
and whether their use is known to the hardware (T means that the translator references the register, 
and H means that there are special hardware semantics for this register). 
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Exception handler temp 


EA forward path for load/store 
instructions/ 
Normal data for non load/store 
instructions 





(1) XPUSH of 32-bits pushes 0x0000 | [15:0] 


1.4 FLOATING POINT REGISTERS 


The floating point data registers are similar to those in x86 architecture except that: 


m there are extra scratch registers available, and 
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= all of the registers may be directly accessed in addition to the x86 stack semantics. 


[Reg | AsmLabel |Description Type 
x86 FP Stack Register 0 
x86 FP Stack Register 1 


x86 FP Stack Register 3 


x86 FP Stack Register 2 


RW 
fe eee 


RW 


|FPo {FPO RR 
|FP1iO_ |FPIO) | {RR 
PFPi1 [FPtQ RT 
|FPi2, |FPI20 [RS 
|FPi3,9 | FPI3-0 [RW 
PFPi4 | FPi4; PR 
Fa OS a a | 


FP16: FP16:FP31 FP Scratch registers 16 to 31 
FP31 





1.5 MMX™ REGISTERS 


XXXX 
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1.6 INTERNAL PROCESSOR REGISTERS 


XXXX. 


CRO... 
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CHAPTER 





INSTRUCTION FORMAT 


This chapter describes the format and bit fields of the alternate instructions. 


2.1 GENERAL FORMAT & PRIMARY OPCODES 


Alternate instruction formats are all instructions 32 bits long with a 6-bit primary opcode field: 


31:26 25:0 
Opcode Dependent 
6 26 


Some of the primary opcodes have extended opcodes in other bits of the instruction. 
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2.1.1 PRIMARY OPCODES 


The primary opcodes are: 


28:26> ©8600 
31:29) 


1 
0 a a ee ee Fe 
1 [ort | appr| anpw |anpm| anpr_| orr_| xorr_| xortu | 
2 |_| eu | _xire_| xsrp | mmx |_| xLMMx | xsMx | 
5 ea ee ee es ee ee ee 
+ |xavu[xacur| xaLur |xavum| | | 
5 |xmisc| | | xuear| xteap || | 
6 
7 
The shaded opcodes have extended opcodes as defined in subsequent sections. Cross-hatched unla- 
beled opcodes represent primary opcodes that are not included in the alternate instruction set. 





2.2 INSTRUCTION FORMATS USED 


2.2.1 IMMEDIATE (ORI-TYPE) INSTRUCTION FORMATS 


Some alternate instructions use the immediate instruction format: 


31:26 25:21 20:16 15:0 
IMMMEDIATE 
6 5 5 16 


The opcode is one of the primary opcodes. The source and target operands for these I-type in- 
structions are: 


GPR[RT] € GPR[RS] opcode IMMEDIATE 


Note that the destination is RT in this case rather than RD as for R-type instructions (which are 
described in a later section). 


The I-type instructions are: 


28:26> 0 1 2 3 4 5 6 7 
31:29) 


1 Lort | appr |anpiu| anpi| anpr | ort | xorr_| xortU 


2-2 Instruction Format Chapter 2 


VIA Confidential VIA C3 Alternate Instruction Set Programming Reference 
November 2002 


2.2.1.1 XALU-Type Instruction Formats 


The XALU-type instruction format is used for x86-style ALU instructions defined using the 
XALU[I][R] primary opcodes. It has special control fields to allow most x86 ALU semantics to be 
specified in a single 32-bit instruction. 


31:26 25:21 20:16 15:11 10:0 
| xargs | ers | rr Funetion 
| xatumey | ers | Cont | D | Function 
6 5 5 5 11 


The source and target operands for XALU[R] instructions is basically the same as for R-type 
instructions. XALUI[R] instructions are similar except that they allow an encoded immediate 
value, to be used instead of RT. is a multi-part field that defines the function 
to be performed. 


GPR[RD] € GPR[RS] function GPR[RT] 
GPR[RD] € GPR[RS] function Const value 


The R versions of the XALU[I][R] instructions cause the x86 result flags to be set as defined for 
the particular instruction type. 


Implementation Note: As described in the instruction definition section, not every version of every 





extended opcode is needed for all of the XALU[I][R] instructions. 
2.2.1.2 Const Field (for XALUI forms) 


The 5-bit field allows a small constant to be used as an immediate value. The following ta- 
ble lists the values for 


| 00000, | 0 | Constantzero 
| 00100, | 8 | Constanteight 
| 01100, | —-8_~__—| Constant minuseight, 
Le a || 
(EE a | 
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10000 OS Operand size: +2 or +4 depending on TSR.OS 


| 10010, | = 6 | Constant6 

| 10100, | 9 ~~ | Constamt9 
10110 OS-related shift magic count: 16 for OS==0, 0 for OS== 

MOU Gs: (i: = Se 

| 11001, | MGS pe 
toto, PE 

pM OWT ca) 2S ies sis ~ 


11100 DFind1 Minus or Plus 1 respectively depending on EFLAGS.DF 


| DFindi__| 
11101, DFindOS | Minus or Plus Operand Size (2 or 4) (defined in TSR.OS) re- 
spectively depending on EFLAGS.DF 


11110 IMMED Use value in the IMMED Register 
11111 DISP Use value in the DISP Register 





2.2.1.3 Function Codes for XALU-Type 


10:8 7:9 4:0 
3 3 5 
SubOp Field 
This field describes the ALU function. It is similar in concept to the 6-bit extended opcode 
for special instructions. The extended opcodes for the primary opcodes are: 
2:05 0 1 2 3 4 5 6 7 


4:3) 


Dane [ows [pec [fron [wor [ow | 
XOR NOR 


1 IN 
2 [ADD [Ape | sos [SB [AND [OR | XOR [ NOR 
sere src [econ anton 


The fourth-row opcodes have special semantics over the normal logical operations: the C2 suffix 
indicates that the destination is a CP2 control register (vs. a GPR. Only some CP2 registers may be 
used as the destination. 





DPcnitl 


The DPcntl field controls (1) the source byte selection, (2) the size of the writes into the register 
file, and (3) the size of the result over which condition codes are calculated. 


In the table below, BE indicates which bytes of the register file are written back ( ); and 
CC is the portion of the result that the condition codes other than AF and PF are calculated over 
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(these are always based on carry from bit 3 and the low order 8 bits of the result). Condition codes 
are always calculated over the low-order bits of the result. 


peent1_| wnemonic | se | cc 
00001121, 3 


| oro, | so 


Chapter 2 























Provides 32 bit operations. All 32 bits are written back to the register file. Condition codes are 
calculated with an operand size of 32 bits. 


16 


Provides for 16-bit operations. The low 16 bits are written back to the register file. Condition 
codes are calculated on the low-order 16 bits of the result. 


LL 


Provides for 8-bit low-low byte operations. The low 8 bits of each operand are used as the 
sources and the low 8 bits of the result are written back to the register file. Condition codes are 
calculated on the low-order 8 bits of the result. 


HL 


Provides for 8-bit high-low byte operations. Bits 15:8 of the left operand are shifted right and 
operated on with the low byte of the right operand. Condition codes are calculated on the low- 
order 8 bits of the result. The low 8 bits are shifted left and written into bits 15:8 of the tar- 
get register. 


HH 


Provides for 8-bit high-high byte operations. Bits 15:8 of both operands are shifted right and 
operated on. Condition codes are calculated on the low-order 8 bits of the result. . The low 8 
bits are shifted left and written into bits 15:8 of the target register. 


LH 


Provides for 8-bit low-high byte operations. Bits 15:8 of the right operand are shifted right 
and operated on with the low byte of the left operand. Condition codes are calculated on the 
low-order 8 bits of the result. The low 8 bits are into the low byte of the target register. 


(reserved) 


Using reserved DPcntl codes will result in unpredictable results. 
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Ind 


The _ field is not useful for alternate instruction mode and must always be all zeros. Non-zero 
values will result in unpredictable results. 


2.2.2 XMISC-TYPE INSTRUCTION FORMATS 


The XMISC-type instruction format is used for miscellaneous x86-related instructions defined 
using the XMISC primary opcode. It is like the XALU format except that the field is 
special for each particular instruction. 


31:26 25:21 
Instruction 


Subfunction Field 


This field describes the specific instruction. It is similar in concept to the 6-bit extended op- 

code for R-form instructions. The extended opcodes for the new primary opcode are 
8:6> 0 1 2 3 4 5 6 7 
10:91 


[eT [RONOLL RRP PRACT] SOWSBN wT [TCHR 
r [xm [|_| Mirco [Mirco [MTCNT] MFONT 


[_[perrvap[overce | [siren [rarce | [cree 


wn FF © 
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2.2.3 XLS-TYPE INSTRUCTION FORMATS 


The new XLS-Type instruction format is used for x86-style load and store (XIx, XSx) instructions 
This form is highly encoded to allow most x86 load/store semantics to be specified in a single 32- 
bit instruction. 


31:26 rasa | 20:16 15:11 10:0 
6 5 5 5 11 
The source and target operands for XL-type instructions are similar to those for load and store in- 
structions (except that the Base register is in a different register field). The x86-style EA is 


calculated as a base register plus an offset. The x86 selector register and other x86 addressing se- 
mantics are specified in the field. 


Load: RS € memory(selector, GPR[Base] + Offset) 
Store: RS » memory(selector, GPR[Base] + Offset) 


This general instruction format is also used for XLEA instructions which do not actually perform 
load or stores but rather calculate an offset address. However, the XLEA instructions have some 
unique fields and should really be considered as special format instructions. 


Offset 


The 5-bit field allows a small constant to be used as an immediate value. The following table 
lists the values for 


| 00000, | 0 | Constantzero 
| 00011, | 4 | Constantfour 
| 00100, | 8 | Constanteight 
| 01100, | -8 | Constant minuseight_ 
| 0110, PO 
CE a (a 
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10000 OS Operand size: +2 or +4 depending on TSR.OS 
| 10001, | PDOS | 
= SNOOIO, ll 5 = = = eeeNNs_§____==] 
| 1001, | 
| 10100, FP 
| oon, | 
CT a (ee 
i es eee 
| 11001, | MGS 
11010 
| 11oit, | 
Minus or Plus 1 respectively depending on EFLAGS.DF 
DFindOS |} Minus or Plus Operand Size (2 or 4) (defined in TSR.OS) re- 
spectively depending on EFLAGS.DF 


IIE) StS 
11111 DISP Use value in the DISP Register 


Function Codes 









The XLx and XSx instructions are used to implement x86 load and store instructions. The func- 
tion field encodes most of the x86 peculiar load/store semantics: 





10:9 8 7:6 5:2 1 0 
AddrSize- AddrSize-0 
2 1 2 4 1 1 


Note that GPR indirection (via the IIR) is not available in the normal load/store format. If 
register indirection is needed, a XLEAx instruction (which allows indirection) must be used to 
calculate the address followed by the load/store instruction using the output register of the 
XLEA as the Base register. 


AddrSize 


Indicates the address size for the effective address calculation of this XL , or XS_ instruction. 


Sel 


Specifies the selector descriptor used for virtual address calculation and limit checking. The bit 
encoding is: 








2-8 Instruction Format Chapter 2 


VIA Confidential VIA C3 Alternate Instruction Set Programming Reference 
November 2002 


0101 

0110 GDT 
0111 LDT 
1000 IDT 
1001 TSS 
Temp0 
[it0o, | 


Ee 

1101 ee 
1110 et a St 
1111 indSEL Use valuein field of the TSR 


Architecture Note: The register encoding values for GDT and LDT are important: they differ 
only in the low-order bit which corresponds to the TI bit in a selector that selects GDT or LDT. 
This bit has a mux on it that selects the saved TI bit or the bitin the _ field based. This mux is 
set-up be the XTI instruction, but affects the following instruction —assumed to be a XLDES. 
See these instructions for more description of this magic. 





The size field indicates which portion of the destination register to update for loads, and the 
size of the source operand for stores. It has different encodings for the various types of x86 
load/store opcodes. The high order two bits come from the Size-2:1 field, and the low order 
bit is bit 1 of the instruction. The XLEAD and XLEAI instructions have an additional 
(highest order) bit which comes from bit 2 of the instruction. 


All Load/Store Instructions (Include LEAx) Except XLDESC,XLFP/XSFP 
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Mnemonic 


| Mnemonic _| 
3 
H 






16H XPUSH: operand size 
16 (31:16) or 32 
(31:16) zero extended 
64 bits 
In this case, the destina- 
tion register (RS) must 
have an even address; the 
data will be loaded into 
RS and RS | 1. 


defined by TSR.DPcntl 
naa 


stack address size-16 or 32 
Poo Pe 


1010, undefined 





Note that there a store of 64 bits is an inva- 
lid (missing) instruction. 





1011, undefined 


1101, undefined 


undefined 


Baer 


daa 


SubOp 


This field controls the use of the effective-address, linear-address, physical-address, and pro- 
tection-calculation hardware for performing operations other than simple loads and stores. It is 
used mostly to precipitate exceptions before modifying the architectural state (thus facilitating 
instruction restartability). It is also used to control bus operations. 





Various types of load/store instructions have different encodings for this field: 
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SubOp Type 1 — For All Load/Store Instructions Except Below Types 


Subop 


Ol, str_int Decr/test COUNT String Semantics 
Allow Interrupt Allow Interrupt 


str_testcnt Test COUNT Test COUNT 
Don’t allow inter- Don’t allow inter- 


ie [mnie [teats | | 
ran ra Stack Segment checks | N/A 


N/A 


SubOp Type 3 - For XLDESC_CS (details on xldesc_cs in page Error! Bookmark not 
defined.) 


es 





spec Checks for special 
microcode 


a 
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SubOp Type 4 - XL2 & XS2 Instructions 


| subop | unemonic | toad desc __—|_——store Dese 
| 00, | tnorm | _ Normal toad 


i 
tickle lock| Tickle Locked toad | 
Assume CPL = 0, Assume CPL = 0, Un- 

Locked Load Lock Store 
SubOp Type 5 - For XIO 


00, 1/0 Read 1/0 Write 


01, special Interrupt Ack Special Bus Cycle 
(type defined by low 
three address bits) 


a ree 
a 


SubOp Type FP — XLFP & XSFP Instructions 


| subop | wnemonic | toad dese || Store Desc | 


00, 
Oi, 
10, 

1, 
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1, 


(nor) poe 
1 


O> NORM REC Load w/ RFP Store w/ RFP 


norm 








Performs a normal load or store operation as specified in the rest of the instruction. This is the 
default in assembler instructions and does not have to be specified. 


rwv (used only with loads) 


Performs the load operation as specified in the instruction, except that it performs all protec- 
tion and access right checks as though this were a store operation. This is meant to be used to 
force read-modify-write operations that are going to fault on the write to fault on the read in- 
stead. This is used to force exceptions to occur before the flags are modified, and to prevent 
partial modification of the target memory location on unaligned references. 


tickle 


Performs the effective-address, virtual-address, and physical-address calculations as specified in 
the instruction including all protection and access-right checks. However, the actual transfer of 
data is inhibited. This is meant to be used to verify that all parts of a data structure will not 
generate faults before a portion of it is modified. 


lock (used only with loads) 


Performs the load operation as specified in the instruction with x86 Lock semantics. The fully 
compatible definition invalidates the cache line containing the data and asserts the LOCK# 


Instruction Format Chapter 2 


VIA Confidential VIA C3 Alternate Instruction Set Programming Reference 


November 2002 


signal on the external bus. Locked loads are also tested for write privileges as described above in 
LSrmv. 


The bus control unit manages the synchronization of asserting and deasserting LOCK#. It 
basically counts consecutive locked loads operations (reaching the bus unit) and deasserts 
LOCK# after the last of an equivalent number of stores. However, the locked store portions of 
the locked RMW sequence must be specified with a Lock SubOp. 


Note that the TSR.LK bit (the x86 instruction had a valid LOCK prefix) forces a LOCK se- 


quence only if the SubOp is RVW. If the SubOp is nom, for example, the LOCK prefix is 
ignored. 





str_int 
Adds the following semantics to the load/store instruction: 


e =Ifthe COUNT register is 0 (considering any effect of the previous instruction), 
do not perform the load or store operation and signal “stop string generation” to the translator. 


e Else, perform the operation and decrement COUNT by one (forwarding results to next instruction test of 
COUNT). 


In addition, an external interrupt, data breakpoint trap exception, or TF exception is allowed 
to occur following successful completion of the associated load/store instruction. 


str_testcnt 
Adds the following semantics to the load/store instruction: 


If the COUNT register is 0 (considering any effect of the previous instruction), do 
not perform the load or store operation and signal “stop string generation” to the translator. 


special 


Allows generation of interrupt acknowledge on loads, and special cycles on stores the low-order 
bits of the address defines the special bus operation (the byte enable lines on the bus define 
the cycle type). 


sup 


Performs the otherwise specified operation but performs all access right and paging tests as 
though CPL == 0. This is used for descriptor table accesses. 


suplk 


Performs the load with the semantics of | and combined. 


Unaligned Operations 


There are two different meanings of 


1. 


The data operand spans across two 8-byte-aligned memory units. This case is handled auto- 
matically by the hardware by decomposing the load/store operation into the correct two 
load/store operations for each portion. This will called 
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2. The data is aligned such that it meets the x86 architecture definition of unaligned. In this 
case, if CPL == 3 && (EFLAGS.AC & CRO.AM) then an alignment exception occurs. 
This will called 


The architecture alignment boundary for data is the same as for the P54 (some of this must be 
enforced by microcode). 


MMX Load and Store Instructions 


The MMxX load and store instructions are formatted identically to the LS form load store instruc- 
tions, with the exception that the load target and the store source registers are MMX operand 
registers. The size 64 store is a true size_64 store in that page protection checking, etc. is per- 
formed for the entire 64 bit operand. Note that this is not the case in the standard form store-64 
instructions. 


ADD LOAD_ALU INFO HERE!! 


FPU Load and Store Instructions 


The FPU load and store instructions are formatted identically to the LS form load store in- 
structions, with the exception that the load target and the store source registers are FPU 
operand registers and the size field (bits 7,6,1 in the instruction) are encoded to represent the 
format of the FPU load/store (see table below). The size 64 store is a true size_64 store in 
that page protection checking, etc. is performed for the entire 64 bit operand. Note that this 
is not the case in the standard form store-64 instructions. 


2.2.3.1 XLFP and XSFP size encodings 


In 16 
any | ee 


FP Sng (S 2 


epwy | 








2.2.4 CP1 (FLOATING PONT) INSTRUCTION FORMATS 


See the floating-point instruction description section. 
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2.2.5 _CP4 (MMX) INSTRUCTIONS 


Coprocessor 4 primary opcodes are used for implementation of the MMxX< instruction set. Please 
refer to the MMxX section for more detail. 


Chapter 2 Instruction Format 2-15 


VIA Confidential VIA C3 Alternate Instruction Set Programming Reference 
November 2002 


CHAPTER 





GENERAL INSTRUCTIONS 


This chapter describes the instructions that operate on the general purpose registers (GPRs) as well 
as the additional temporary registers. 


3.1 ALU INSTRUCTIONS 


3.1.1 IMMEDIATE INSTRUCTIONS 


Usage Note: The following table summarizes what gets modified how by the 16-bit immediate 
logical instructions: 


Instruction Upper RT Lower RT 
ANDI 0 RS(15:0) & Immed 
ANDIL : RS(15:0) & Immed 


ANDIU RS(15:0) 


ORI : aoe - | Immed 
ORIU RS( 
XORI ( 
XORIU ( 


15: 4) *“ Immed 
15:0) 
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3.1.1.1 ADDI - Add Immediate 


Encoding 
31:26 25:21 20:16 15:0 
appt [_ oo1001 
6 5 5 16 
Format 
Description 
The contents of GPR is added to the field extended with 0x0000. The result is 


written back to GPR 


Operation 
GPR[RT] = GPR[RS] + IMMEDIATE; 























Flags Setting 


None 


Exceptions 


None 
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3.1.1.2 ANDI - AND Immediate 
Encoding 
31:26 25:21 20:16 15:0 
anpi — [_oo1100 
6 B) 5 16 
Format 
ANDI RT,RS,0x1234 
Description 
the field and ANDs it to 32-bit GPR with the result replacing 32-bit 
GPR 
Operation 
GPR[RT] = GPR[RS] & 
Flags Setting 
None 
Exceptions 
None 
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3.1.1.3 ANDIL - AND Immediate Lower 


Encoding 
31:26 25:21 20:16 15:0 

ANDIL: [_oot011 
6 5 5 16 

Format 








ANDIL RT,RS,0x1234 


Description 
The contents of GPR are ANDed with the field extended with OxFFFF. The result 
is written back to GPR 
Operation 
GPR[RT] = GPR[RS] & (OxXFFFFOO0OO | IMMEDIATE); 








Flags Setting 


None 


Exceptions 


None 
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3.1.1.4 ANDIU - AND Immediate Upper 
Encoding 
31:26 25:21 20:16 15:0 
anpw: [_oo1010 
6 5 5 16 
Format 





ANDIU RT, RS, 0x6789 








Description 
The contents of GPR are ANDed with the immediate field shifted left 16 and extended on the 
right with OxFFFF. The result is written back to GPR 


Operation 
GPR[RT] = GPR[RS] & ((IMMEDIATE << 16) | Ox0QOOFFFF) ; 





Flags Setting 


None 


Exceptions 


None 
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3.1.1.5 ORI -OR Immediate 


Encoding 
31:26 25:21 20:16 15:0 

ORI: 001101 
6 5 5 16 

Format 





ORI RT, RS, 0x1234 


Description 
the field and ORs it to 32-bit GPR with the result replacing 32-bit 
GPR 
Operation 
GPR[RT] = GPR[RS] | MMEDIATE; 

















Flags Setting 


None 


Exceptions 


None 
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3.1.1.6 ORIU — OR Immediate Upper 
Encoding 
31:26 25:21 20:16 15:0 
ort: [001000 
6 5 5 16 
Format 
ORIU RT, RS, 0x1234 
Description 
The contents of GPR are ORed with the immediate field shifted left 16. The result is written 
back to GPR 
Operation 
GPR[RT] = GPR[RS] | (IMMEDIATE << 16); 


Flags Setting 


None 


Exceptions 


None 
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3.1.1.7 XORI - XOR Immediate 





Encoding 
31:26 25:21 20:16 15:0 
xorr: [_oo110 
6 5 5 16 

Format 

XORI RT,RS,0x1234 
Description 

the field and XORs it to 32-bit GPR __ with the result replacing 32-bit 
GPR 

Operation 





GPR[RT] = GPR[RS] * IMMEDIATE; 














Flags Setting 


None 


Exceptions 


None 
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3.1.1.8 XORIU - XOR Immediate Upper 
Encoding 
31:26 25:21 20:16 15:0 
xort: [oot 
6 5 5 16 
Format 
XORIU RT,RS,0x1234 
Description 
The contents of GPR are XORed with the immediate field shifted left 16. The result is written 
back to GPR 
Operation 
GPR[RT] = GPR[RS] * (IMMEDIATE << 16); 


Flags Setting 


None 


Exceptions 


None 
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3.1.2 X86-SEMANTIC INSTRUCTIONS 


3.1.2.1 XADC[I][R] - X86 Add with Carry 


Encoding 
31:26 29:21 20:16 ~—=15:11 10:8 


XADC: 100000, DPcntl ADC 
XADCR: 100010, 10001 


31:26 25:21 20:16 15:11 10:8 
XADCI: 100001, Const DPcntl ADC 
XADCIR: 100011, 10001 


Format 




















Description 


The contents of GPR are added to either the contents of GPR or the immediate value 
specified in depending on the primary opcode. The EFLAGS.CF is also added to the result. 


Operation 
GPR[RD] = GPR[RS] + GPR[RT] + CF 
GPR[RD] = GPR[RS] + Const + CF 





Flags Setting 
CCarith 


Exceptions 


None 
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3.1.2.2 XADDII][R] - X86 Add 


Encoding 
31:26 25:21 20:16 = 15:11 10:8 


XADDR: 100010, 10000 


31:26 25:21 20:16 15:11 10:8 
XADDI: 100001, Const DPcntl ADD 
XADDIR: 100011, 10000 


Format 
XADD RD,RS,RT 
XADDI RD,RS,2 














Description 
The contents of GPR RS are added to either the contents of GPR or the immediate value 
specified in depending on the primary opcode. 
Operation 
GPR[RD] = GPR[RS] + GPR[RT] 
GPR[RD] = GPR[RS] + Const 








Flags Setting 
CCarith 


Exceptions 


None. 
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3.1.2.3 XAND[I][R] - X86 And 


Encoding 
31:26 29:21 20:16 = 15:11 10:8 


XANDR: 100010, 10100 


31:26 25:21 20:16 15:11 10:8 
XANDI: 100001, Const DPcntl AND 
XANDIR: 100011, 10100 


Format 
XAND RD,RS,RT 
XANDI RD,RS,2 





Description 
The contents of GPR RS are logically ANDed to either the contents of GPR RT or the value im- 
mediate specified in depending on the primary opcode. 

Operation 


GPR[RD] = GPR[RS] & GPR[RT] 
GPR[RD] = GPR[RS] & Const 





Flags Setting 
CClog 


Exceptions 


None. 
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3.1.2.4 XCMPS - X86 A-Stage Compare String 


Encoding 
31:26 25:21 20:16 15:11 10:8 


01001 


Format 
XCMPS  RS,RT 





Description 


If the COUNT register modulo TSR.AS is not zero then it is decremented by one. The contents 
of GPR _are compared in the to the contents of GPR _according to the size specified 
by the field as described in the following table. If any of the conditions to terminate or 
interrupt an x86 string compare are true then the instruction is nullified and the translator is sig- 
nalled to stop generating the sequence for a repeat string and control is transferred to microcode. 
The conditions under which a string compare terminate are either the COUNT register modulo 
TSR.AS is zero or the comparison results in equality and TSR.REPN is set or the com- 
parison results in inequality and TSR.REPN is clear. The conditions under which a string 
compare is interrupted is an external interrupt or debug trap. If the string compare is interrupted 
or a debug trap is triggered before the terminating condition is met then hardware will set 
XCR[STRINT_BIT] at the .T that completes the current x86 instruction; IP is not advanced in 
this case. See the operation description below. 


pPcnt 
Pen Pe |e 


001, 
010, 


110, 








Usage Note: XCMPS is needed to allow the translator to generate the instruction sequence for x86 
string compares and scan strings. 





A sample usage is 
MTCNT ECX // mod AS automatically 
XPOP.8L.AS tmp2,DSdesc,ESI,1,str_testcnt 
XPOP.8L.AS tmp3,ESdesc,DSI,1,str_testcnt 
XCMPS.8L — tmp2,tmp3 


which loads the COUNT register with CX or ECX based on the address size, then checks for 
COUNT equal to zero and if not zero then loads a byte from DS:[E]SI and from ES:[E]DI and 
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compares the two bytes. The loads and the compare are repeatedly generated by the translator until 
COUNT goes to zero or the terminating condition of equality is reached or there is an interrupt or 
debug trap. Note that XCMPS does not modify EFLAGS, microcode must determine if the origi- 
nal count in [E]CX was non-zero and set EFLAGS in that case. 






Operation 
if (COUNT.AS != 0) { 
COUNT = COUNT - 1; 
} 
REP_DONE = (COUNT.AS == 0 or not((GPR[ |]==GPR[_ ]) * TSR.REPN)); 
if (REP_DONE or interrupt or IT or TF or DBRKPT) { 


send signal to xlator to stop; 

nullify inst; 

nullify pipeline; 

transfer control to corresponding microcode entry point; 





//--The following action taken at .T that completes current x86 instruction 
if (REP_DONE == 0) { 


IP advance; 


XCR[STRINT_BIT] = 1; 





Flags Setting 


None 


Exceptions 


None. 
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3.1.2.5 XDECIR - X86 Decrement 


Encoding 
31:26 25:21 20:16 =15:11 10:8 


01010 





Format 
XDECIR RD,RS,1 
Description 
The contents of the immediate specified in are subtracted from the contents of GPR RS. 


Usage Note: XDECIR is needed to perform a single-cycle X86 DEC function. It is the same as a 





XSUBIR instruction (subtract immediate) except that it sets EFLAGS differently than XSUBIR. 


Operation 
GPR[RD] = GPR[RS] - Const 


Flags Setting 
CCinc 


Exceptions 


None. 
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3.1.2.6 XIDIV - X86 Signed Divide Step 





Encoding 
31:26 25:21 20:16 ~=15:11 =:10:9 8 7:5 4:0 
XIDIV: 100000, 00000, Cntl IDIV 
01110 
6 5 5 5 2 1 3 5 
Format 
Description 
Performs an signed divide of the dividend in (loaded by DMTMD.DVD) by the di- 
visor in (loaded by DMTMD.DVS).. The size of the dividend, divisor, and quotient 
are specified by the field and summarized in the following table. The dividend and divisor 


must be loaded using the DMTMD.DIV.size instruction before issuing XDIV. For 64b-by32b 
divide the dividend is loaded separately using DMTMD.DIVidend.32 instruction. XIDIV does 
not allow register indirection or (immediate) values. XIDIV computes as many quotient bits 
as are specified in the table below. The quotient is written to the LO register, the remainder is 
written to the HI register. A MFLOU.size instruction is used to get the quotient. A MFHI in- 
struction is used to get the remainder. This DIVIDE instruction is a divide styep instruction that 
produces one bit of result per cycle. Unlike previous implementations, C3 microcode will control 
all steps of the multiply process. 


General Divide step 


Detect Divide Ovf 


| nem | Remainder Adjust 





Quotient Adjust 








Flags Setting 


None 


Exceptions 


None 
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3.1.2.7 XIMUL[I] - X86 Signed Multiply Step 


Encoding 
31:26 29:21 20:16 = 15:11 10:9 


01100 





31:26 25:21 20:16 15:11 10:9 8 7:5 4:0 
XIMULI: , | RS | Const | 00000,|} xx, . Cntl | IMUL 
01100 
6 5 5 5 2 1 3 5 
Format 
XIMUL RS,RT 





XIMULI RS;,2 


Description 
Performs a signed multiply of the operands in the MUL operand registers. The product is written 
to the LO register. The size of the factors and product are specified by the field and sum- 
marized in the following table. XIMUL does not allow register indirection. If is used it 
must be coded as IMMED. 


First Multiply Clock (choose ops) 





LastClk Load result into LO 
co 





Operation 


Flags Setting 


None 


Exceptions 


None 
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3.1.2.8 XMUL - X86 Unsigned Multiply Step 


Encoding 
31:26 25:21 20:16 = 15:11 10:9 


01101 


Format 
XMUL RS,RT 





Description 
Performs an unsigned multiply of | and. The product is written to the LO register. The size 
of the factors and product are specified by the field and summarized in the following table. 


XIMUL does not allow register indirection. If is used it must be coded as IMMED. 


First Multiply Clock (choose ops) 


LastClk Load result into LO 





Operation 


Flags Setting 


None. 


Exceptions 


None 
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3.1.2.9 XINCIR - X86 Increment 


Encoding 
31:26 25:21 20:16 15:11 10:8 


01000 


Format 
XINCIR RD,RS,1 





Description 
The contents of the immediate specified in are added to the contents of GPR RS. 


Usage Note: XINCIR is needed to perform a single-cycle X86 INC function. It is the same as a 





XADDIR instruction (add immediate) except that it sets EFLAGS differently than XADDIR. 


Operation 
GPR[RD] = GPR[RS] + Const 





Flags Setting 
CCinc 


Exceptions 


None. 
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3.1.2.10X8NORII][R] - X86 NOR 


Encoding 
31:26 29:21 20:16 ~=15:11 10:8 


X8NORR: 100010, 10111 


31:26 29:21 20:16 15:11 —:10:8 


X8NORI: 100001, Const DPcntl NOR 
X8NORIR: 100011, 10111 

















Format 
X8NOR RD,RS,RT 
X8NORI RD,RS,0O // Same as RD = NOT(RS) 
Description 
The contents of GPR are logically NORed with the immediate value specified in de- 
pending. 


Usage Note: By specifying a constant of zero, this instruction directly performs an X86 NOT in- 





struction function. This is the intended function of this instruction. 





Operation 
GPR[RD] = ~ (GPR[RS] | GPR[RT]) 
GPR[RD] = ~ (GPR[RS] | Const) 


Flags Setting 


None 


Exceptions 


None 
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3.1.2.11X8OR[I][R] - X86 OR 


Encoding 
31:26 25:21 20:16 ~—15:11 10:8 


X8ORR: 100010, 10101 


31:26 29:21 20:16 15:11 10:8 


X8ORI: 100001, Const DPcntl OR 
X8ORIR: 100011, 10101 











Format 
X80OR RD,RS, RT 
X8ORI  RD,RS,O // Moves RS to RD 
Description 
The contents of GPR RS are logically ORed to either the contents of GPR RT or the value imme- 
diate specified in depending on the primary opcode. 
Operation 
GPR[RD] = GPR[RS] | GPR[RT] 
GPR[RD] = GPR[RS] | Const 





Flags Setting 
CClog 


Exceptions 


None. 
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3.1.2.12 XRCL - X86 Rotate Left Thru Carry 


Encoding 
31:26 25:21 20:16 15:11 10:8 : 


XRCLIR: 100011, 00110 


Format 
XRCLI RD,RS 





Description 
This instruction implements a rotate left through carry of a GPR. The low-order portion of GPR 
(as defined by the operand size in with the carry flag concatenated on the left end is 


rotated left 


Usage Note: This instruction directly performs an X86 RCL instruction of . The multi-bit 
X86 instructions are trapped to the microcode and performed one bit at a time. 


Architecture Note: This instruction only rotates through carry one bit because (1) it is much 
harder to do multi-bit rotates through carry, and (2) the multi-bit forms are rarely used in X86. 





Operation 
Temp bit. = BRPLAGS CF 








EFLAGS.CF = MSB(GPR[RS]) // MSB = most significant bit 
GPR[RD] = (GPR[RS] << 1) | Temp bit 








Flags Setting 
CCrl 


Exceptions 


None. 
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3.1.2.13XRCR - X86 Rotate Right Thru Carry 
Encoding 
31:26 25:21 20:16 15:11 10:8 4:06 
XRCRIR: 100011, 00111 
Format 


XRCRI RD, RS 





Description 
This instruction implements a rotate right through carry of a GPR.. The low-order portion of 
GPR (as defined by the operand size in with the carry flag concatenated on the right 
end is rotated right 


Usage Note: This instruction directly performs an X86 RCR instruction of . The multi-bit 
X86 instructions are trapped to the emulator and performed one bit at a time. 


Architecture Note: This instruction only rotates through carry one bit because (1) it is much 
harder to do multi-bit rotates through carry, and (2) the mult-bit forms are rarely used in X86. 





Performance 


If there is no data dependency stall, this instruction executes in one clock. 


Operation 





EFLAGS.CF << N // N or 8 depending on 





EFLAGS.CF = GPR[RS] & 1 
PRI RD) -=“Temp - ||. (GPRIRS | .2> 1) 








Flags Setting 
CCrr 


Exceptions 


None. 
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3.1.2.14XROLII][R] - X86 Rotate Left 


Encoding 
31:26 29:21 20:16 = 15:11 10:8 


XROLR: 100010, 00100 


31:26 29:21 20:16 15:11 10:8 


XROLIR:: 100011, 00100 


Format 
XROL RD,RS,RT 
KROL RD,RS, 2 





Description 
This instruction implements a rotate left of a GPR. The low-order portion of GPR _as defined 
by the operand size in is rotated left the valueinGPR or modulo 32. 
Operation 
Temp = XXXXXX 





Flags Setting 
CCrl 


Exceptions 


None. 
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3.1.2.145XRORII][R] - X86 Rotate Right 


Encoding 
31:26 25:21 20:16 15:11 10:8 


XRORR: 100010, 00101 


31:26 29:21 20:16 15:11 10:8 


XRORI: 100001, RS Const DPcntl ROR 
XRORIR: 100011, 00101 


Format 
XROR RD,RS,RT 
XRORI RD,RS;2 








Description 


This instruction implements a rotate right of a GPR. The low-order portion of GPR _as defined 
by the operand size in is rotated right the valueinGPR — or modulo 32. 


Operation 
Temp = XXXXXX 





Flags Setting 
CCrir 


Exceptions 


None. 
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3.1.2.16XSBBII][R] - X86 Subtract with Borrow 


Encoding 
31:26 25:21 20:16 = 15:11 10:8 


XSBBR: 100010, 10011 


31:26 29:21 20:16 15:11 10:8 


XSBBI: 100001, RS Const DPcntl SBB 
XSBBIR: 100011, 10011 


Format 
XSBB RD,RS,RT 
XSBBI RD,RS,2 








Description 


The contents of GPR or the immediate specified in , depending on the primary opcode, 
are subtracted from the contents of GPR 


PPeration 


GPR[RS] - GPR[RT] - EFLAGS.CF 

















GPR[RS] - Const — EFLAGS.CEF 





Flags Setting 
CCarith 


Exceptions 


None. 
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3.1.2.17XSETCC - X86 SETcc 


Encoding 
31:26 25:21 20:16 15:11 10:8 


7:5 4:0 

XSETCC: 100000, | 00000, | 00000, | RD | 000, DPcntl | XSETCC 
11101 
6 5 5 5 3 3 5 


Format 
XSETCC RD 





Description 


The contents of GPR _ (expected to be coded as 0) are logically ORed to the contents of GPR 
(also expected to be coded as 0), the low order bit of the result is then logically ORed with the 
condition specified by the tttn field of the IIR. 


Architecture Note: This instruction is intended for the translator to implement the x86 SETcc 


instruction, its use in microcode is limited by the fact that the condition to test is specified in the 
tttn field of IIR. 





Architecture Differences: this instruction is new to C2. 


Operation 


ee? 


Flags Setting 


None. 


Exceptions 


None. 
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3.1.2.18XSHLII][R] - X86 Shift Left Logical 


Encoding 
31:26 25:21 20:16 ~=—15:11 10:8 


XSHLR: 100010, 00000 


31:26 25:21 20:16 15:11 10:8 
XSHLI: 100001, RS Const DPcntl SLL 
XSHLIR:: 100011, 00000 


Format 
XSHL RD,RS,RT 
XSHLI RD,RS,2 

















Description 
This instruction implement a shift left of a GPR. The low-order portion of GPR _as defined by 
the operand size in is shifted left the valuein GPR or modulo 32. 

Operation 





GPR[RD] = xxxxxx 


Flags Setting 
CCshl 


Exceptions 


None. 
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3.1.2.19XSAR[I][R] - X86 Shift Right Arithmetic 


Encoding 
31:26 25:21 20:16 15:11 10:8 


XSARR: 100010, 00011 


31:26 25:21 20:16 15:11 10:8 
XSARI: 100001, RS Const DPcntl SRA 
XSARIR: 100011, 00011 


Format 
XSAR RD,RS,RT 
XSARI RD,RS,2 








Description 


This instruction implement a shift right arithmetic of a GPR. The low-order portion of GPR 
as defined by the operand size in is shifted right arithmetically the value in GPR or 
modulo32. 


Operation 
GPR[RD] = xXxXxXxxx 


Flags Setting 
CCsar 


Exceptions 


None. 
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3.1.2.20XSHRII][R] - X86 Shift Right Logical 


Encoding 
31:26 25:21 20:16 = 15:11 10:8 


XSHRR: 100010, 00010 


31:26 29:21 20:16 15:11 10:8 
XSHRI: 100001, RS Const DPcntl SHR 
XSHRIR: 100011, 00010 


Format 
XSHR RD,RS,RT 
XSHRI RD,RS,2 

















Description 
This instruction implement a shift right logical of a GPR. The low-order portion of GPR as 
defined by the operand size in is shifted left logically the value in GPR or 
modulo 32. 

Operation 


GPR[RD] = xXxxxxx 


Flags Setting 
CCshr 


Exceptions 


None. 
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3.1.2.21XSUB[I][R] - X86 Subtract 


Encoding 
31:26 25:21 20:16 15:11 10:8 


XSUBR: 100010, 10010 


31:26 29:21 20:16 15:11 10:8 


XSUBI: 100001, RS Const DPcntl SUB 
XSUBIR: 100011, 10010 


Format 
XSUB RD,RS,RT 
XSUBI RD,RS;2 








Description 
The contents of GPR __ or the immediate specified in , depending on the primary opcode, 
are subtracted from the contents of GPR 
Operation 
GPR[RD] = GPR[RS] - GPR[RT] 
GPR[RD] = GPR[RS] - Const 





Flags Setting 
CCarith 


Exceptions 


None. 
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32 


3.1.2.22X8XOR[I][R] - X86 XOR 


Encoding 
31:26 25:21 20:16 = 15:11 10:8 


X8XORR: 100010, 10110 


31:26 29:21 20:16 15:11 10:8 


X8XORI: 100001, RS Const DPcntl XOR 
X8XORIR: 100011, 10110 








Format 
X8XOR RD,RS,RT 
X8XORI RD,RS,2 
Description 
The contents of GPR RS are logically XORed to either the contents of GPR RT or the value im- 
mediate specified in depending on the primary opcode. 
Operation 


GPR[RD] = GPR[RS] * GPR[RT] 
GPR[RD] = GPR[RS] * Const 





Flags Setting 
CClog 


Exceptions 


None. 
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3.2 EFLAGS UPDATE FORMS 


The EFLAGS update forms specify how the EFLAGS register is modified by certain XALU in- 
structions. These update forms specify how the CF, PF, AF, ZF, SF, and OF are modified; there 
is no hardware support for modifying the other bits in EFLAGS. In the following sections the ex- 
act semantics of each EFLAG update form will be specified. 


In the following table the EFLAG update forms are shown along with the corresponding opera- 
tions performed on the flag bits. An _ indicates the flag is modified, a — indicates the flag is 
specified in an X86 as undefined (the actual value used on a C1 is the same as on a Pentium and is 
documented in the detail descriptions), and a blank indicates that the flag is unmodified. 


asnet_jor Lar tar tar Ler ter X86 Instruction | ci_tnstructions 
CCnop —t— NOT MENTIONED BELOW | NOT MENTIONED BELOW 


CCarith ADC, ADD, CMP, CMPS, XADC, XADD, XSBB, XSUB 
CMPXCHG, NEG, RSM, 
SBB, SCAS, SUB, XADD 

clog fs fx fa fn bo AND, OR, TEST, XOR__| XAND, X8OR, X8XOR 


CCshl le [uy | lan _| lar | SALI, SAL N, SHL 1, SHL.N | XSHL 


ccshr ra caica ow EN CR SARI, SARN, SHR 1, SHR | XSHR, XSAR 

Cerih RCLI, RCLN, ROL 1, ROL XRCL, XROL 
N 

corr wy oT | fe i RCRL, RCRN, ROR 1, ROR | xXRCR, XROR 


jecaiw —[- [- |- |- |- |- [ow ow 
jecsaiv [- [- |- |- |- |- [ow fw 
jeciman [mw |- |- |- |- fm [om fe 
jeoma fw f- [~~ fe fe foe aes 


Default Flag Setting 























In most of the following descriptions is the 32 bit result of an operation, 

and are the carry outs from bits 3, 6, 7, 14, 15, 30, and 31 of the adder respec- 
tively. and are encodings of the DPcntl field specifying that the operation is 8, 
16, or 32 bits wide. The variable SubOp indicates that the instruction is a subtract. 


For the shift and rotate flag setting operations, is the 34 bit output of the shifter. This al- 
lows the code to specify [-1] and [33] which are required to generate the carry out for 
the shift and rotate instructions. The high and low bits (33 and -1) only go to the condition code 
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unit, they never appear in registers. The semantics of the flag operations are specified in a pseudo-C 
language. 


CF - Carry Flag 


Set on a carry out or not a borrow from the high-order bit of an add or subtract calculation respec- 
tively; cleared otherwise. The high-order bit is determined by the DPcntl field in the extended 
opcode which specifies bit 7 for eight bit operations, bit 15 for sixteen bit operations, and bit 31 
for thirty-two bit operations. 





= SubOp * CF7; 
P16 





= SubOp * CFI15; 
DP32 
CF = SubOp * CF31; 














PF - Parity Flag 


Set if the low order eight bits of the result have an even number of ones; cleared otherwise. 
PF =! (Result[7] * Result[6]* Result[5]* Result[4]% 
Result[3] * Result[2]* Result[1]* Result[0]); 


AF - Auxiliary Carry Flag 
Set on a carry out of bit 3 or not a borrow from bit 3 of an add or subtract calculation respectively; 


cleared otherwise 
AF = SubOp * CF3; 


ZF - Zero Flag 


Set if the result of an operation modulo the operation size, specified in DPcntl, is all zeros; cleared 
otherwise. 
i DP8 

= Resul 
DP16 
= Result[15:0] 




















ZF = Resul 


SF - Sign Flag 


Set to equal the high-order bit of the result. The high order bit of the result is specified by the size 
in DPcntl. 

1£ DP8 

SF = Resul 
DP16 
SF = Resul 
f DP32 

SF = Resul 
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OF - Overflow Flag 

Set if the carry out of the high-order two bits differ. ‘The high-order bit of the result is specified by 
the size in DPcntl. Note that for adds and subtracts, OF may also be determined from the signs of 
the result versus the sign of the sources. 

f DP8 

OF = CF7 * CF6; 
f DP16 





OF = CF15 * CF14; 
DP32 
OF = CF31 * CF30; 








CCarith 


The CCarith flags setting causes all the flags to be set as described in the Default Flag Setting sec- 
tion above 























CCinc 


The CCinc flags setting causes all the flags except the CF to be set as described in the Default Flag 
Setting section above. The CF is unmodified. 























CClog 
The CClogic flags setting causes the SF, ZF, and PF to be set as described in the Default Flag 
Setting section above.. The CF and OF are cleared. AF is architected as undefined; its real behavior 
is: that it is cleared. 
= defaultSF; 
defaultZF; 
faultPF; 

















// Verified on Pentium 
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CCnop 


The CCnop flags setting leaves all flags unchanged. 
CCshl 


Note the special setting of CF and OF for byte shifts with 
shift counts greater than operand size. 
if (shiftCnt != 0) { 
ZF = default2ZF; 
PF = defaultPF; 
SF = defaultSF; 
AF = 1; 
if {shiftCnt >= operand size) 
Cr =}. (> (Snatecnt: = operand saze) == 
& bit 0 of original data; 

















else 
Cr =} “result operand size? 
OF = 3 10ns 





if (shiftCnt != 0) { 
ZF = defaultZF; 
PF = defaultPF; 
SF defaultSF; 
AF = 1; 
OF = 0; 
CF = result[-1]; 
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CCshr 


Note the special setting of CF and OF for byte shifts with 
shift counts greater than operand size. 
if (shiftCnt != 0) { 
SF = defaultSF; 
ZF = defaulta2F; 
PF = defaultPF; 
= 1; 
(shiftCnt >= operand size) 
CE = (- {ShuUreCnt Ss Opetand 6176) == 
& MSB of original data; 







































lse 
CF = result[-1]; 
if (shiftCnt == 1) 
OF = high bit of result; 
else 
OF 





CCrl 


The SF, ZF, AF, and PF flags are not affected. 
if (shiftCnt != 0) { 
OF = result[len] * result[len+l1]; 
CF result[lentl1] 




















CCrr 


The SF, ZF, AF, and PF flags are not affected. 
if (shiftCnt != 0) { 
OF = result[len] * result[len-1]; 
t[-1] 














CF = result 
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3.3 LOAD/STORE INSTRUCTIONS 


There are no MIPS format load/store instructions and no way to perform load/stores to native- 
mode address space. All load/store operations use X86 address and load/store semantics. 


3.3.1.1 XIOR - X86 I/O Read 





Encoding 
31:26 25:21 = 20:16 ~—=15:11 10:9 8 7:6 pie 1 0 
XIOR: 
6 5 5 5 2 1 Z 4 1 1 
Format 
Description 
Performs an I/O read operation into GPR 
The effective address is calculated by adding the value specified in to the contents of GPR 
modulo the address size specified in . The linear address is calculated with respect to 
the segment descriptor specified in , The field specifies the number of bytes to load and the 
location within the target register. The field specifies further special I/O read semantic in- 
formation. 


No virtual-to-physical address translation is performed on the linear address. The operation by- 
passes the cache and performs an I/O read bus cycle. 


Usage Note: To perform an X86 IN operation, the Seg should be a flat 32-bit linear address seg- 
ment, the Base register contains the I/O address (from DX or the immediate field), the Offset is 0. 


The 16-bit base address value should be zero extended to 32-bits since the address size from the 
translator could be 32 bits. 





Exceptions 


None. 
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3.3.1.2 XIOW - X86 I/O Write 
Encoding 
31:26 25:21 20:16 15:11 10:9 8 7:6 5:2 1 0 
Sizel Size0 
6 5 5 5 2 1 2 4 1 1 
Format 
Description 
Performs an I/O write operation of the data in GPR 
The effective address is calculated by adding the value specified in to the contents of GPR 
modulo the address size specified in . The linear address is calculated with respect to 
the segment descriptor specified in , The field specifies the number of bytes to load and the 
location within the target register. The field specifies further special I/O read semantic in- 
formation. 


No virtual-to-physical address translation is performed on the linear address. The operation by- 
passes the cache and performs an I/O read write cycle. 


Usage Note: To perform an X86 OUT operation, the Seg should be a flat 32-bit linear address 
segment, the Base register contains the I/O address (from DX or the immediate field), the Offset is 


0. The 16-bit base address value should be zero extended to 32-bits since the address size from the 
translator could be 32 bits. 





Exceptions 


None. 
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3.3.1.3, XL = X86 Load 


Encoding 
31:26 25:21 =20:16 = 15:11 10:9 8 7:6 9:2 1 0 
Sizel Size0 





Description 


Performs a load with X86 addressing semantics into GPR Data Register (or RS | 1 for load of 
64 bit size) 


The effective address is calculated by adding the value specified in to the contents of GPR 
modulo the address size specified in . The linear address is calculated with respect to 
the segment descriptor specified in , The field specifies the number of bytes to load and the 
location within the target register. ‘The field specifies further special load semantic informa- 
tion. 
Performance 


If there is no data dependency stall, and the requested data is in the cache and contained within an 
eight-byte aligned data unit, the load executes in one clock. Note that the loaded data is available to 
the next ALU or store instruction without a pipeline stall. However, if the load is used as a source 
to an addressing calculation in the next instruction, there is a one-cycle data dependency stall 
(AGI) on the subsequent instruction. 


If the requested data is not in the cache, the instruction (and all subsequent instruction) execution 
stalls until the data is found and returned. 


If the data spans an eight-byte aligned unit, this instruction is automatically decomposed into two 
sequential loads to get the two data portions. ‘The timing of each follows the rules above except that 
the second load can’t cause an data dependency stall. 


If LOCK is specified, timing gets more complicated depending on the compatibility mode in ef- 
fect. 


Operation 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.4 XLDESC - X86 Load Descriptor 


Encoding 


31:26 29:21 20:16 15:11 10:9 8 7:6 5:2 1 0 


XLDESC: 





Description 


Even though the load descriptor instruction is a load instruction, its documentation has moved to 
the Segment Register function section to be with its tightly-bound companion the XTI instruc- 
tion. 
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3.3.1.5 XLEAD - X86 Load Effective Address - Displacement 


Encoding 
31:26 25:21 20:16 15:11 10:9 8 


7:6 5:3 2 1 0 
Sizel 2:1 -0 Size0 
1 2 3 1 1 1 





6 5 5 5 2 
Description 
Performs an X86 (EA) calculation by adding GPR to the value specified by the 
field (typically the DISP register) modulo the size defined by the field and storing 
the result back into GPR __ with size defined by the field. On C2 an of operand 


size (TSR.OS) is encoded as 01, which C1-A interprets as stack address size ([TSR.SAS). 
Architecture Note: ‘The XLEAx instructions are not really load and store instructions and don't 
need all of the load/store semantics (such as a segment register and the load/store SubOp field), but 
they are included in this section since they have many features in common with load/store instruc- 
tions and are used with them. 


Usage Note: This instruction is needed for three reasons: (1) to (partially) perform an X86 LEA 
function, (2) to partially perform address calculations using a base register and an index register and 
a displacement, and (3) to perform address calculations in trapped microcode since the XL and XS 


instructions can’t use register indirection. 


This XLEAD instruction performs an X86 EA calculation using a base register and a displace- 
ment. When combined with the XLEAT instruction, a full three component X86 LEA can be 
performed. 





Performance 


Same rules as for the XLEAI instruction. 


Exceptions 


none 
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3.3.1.6 XLEAI - X86 Load Effective Address - Indexed 
Encoding 
31:26 25:21 20:16 15:11 10:9 7:6 5:3 2 1 0 
count | Sizel 0 Size0 
6 5 2 3 1 1 1 
Description 
Performs an X86 (EA) calculation by adding GPR to GPR storing the 
result modulo the size defined by the field back into with size defined by the 
field... On C2 an of operand size (TSR.OS) is encoded as 01, which C1-A inter- 


prets as stack address size (TSR.SAS). 


Usage Note: This instruction is needed for three reasons: (1) to (partially) perform an X86 address 
calculation involving both a base and an index register, and (2) to perform address calculations in 
trapped microcode since the XL and XS instructions can't use register indirection. 


This XLEAI instruction performs an X86 EA calculation using a non-shifted index register and a 
base register. X86 LEA calculations using a scaled index value require that a separate shift instruc- 
tion be performed. X86 LEA calculations involving a register and a displacement can use the 
XLEAD instruction. 


Architecture Differences: The extension of Size for XLEAI to 4 bits is new to C2 as is allowing 
operand size encoding in 





Performance 
If there is no data dependency stall, this instruction executes in one clock. 


If either RS or RT are the result of a load or an ALU operation on the immediately preceding in- 
struction, there is an additional one-cycle data dependency stall. 


Architecture Note: This is effectively a register-register add operation that executes in the A-stage 


versus a normal ALU instruction that executes in the D-stage. ‘Thus, address calculations instruc- 
tions that follow this one don’t have a data-dependency stall on the XLEA instruction results. 





Exceptions 


none 
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3.3.1.7 XPOP[BR] - X86 POP (Load with Post Update) 





Encoding 
31:26 25:21 =20:16 15:11 10:9 8 7:6 52 1 0 
Sizel Size0 
6 5 5 5 z 1 2 4 1 1 
31:26 25:21 20:16 15:11 10:98 8 7:6 D2 1 0 
XPOPBR: 110110, Offet Base SubOp | Addr Size2] Seg Size-0 
Sizel Size0 





Description 


Performs a load with X86 addressing semantics into GPR _and updates the base register with a 
new address. 


The effective address is the value in GPR modulo the address size specified in . The 
linear address is calculated with respect to the segment descriptor specified in , The field 
specifies the number of bytes to load. The field specifies further special load semantic in- 
formation. 


, the address size portion of GPR is re- 
placed with the result of adding the to the contents of _If and are the same 
register, then the behavior of the instruction depends on the implementation: 


e OnCl-A the load data is loaded into and the address offset is not used. 


e On C2 the update of with number of bytes from the address offset can be 
thought of as occuring first, then the load of | with S number of bytes from memory. 
This is relevant when the number of bytes specified in is larger than S 


Usage Note: This instruction is intended to directly perform an X86 POP function in one clock.. 
In order to perform the POP SP function correctly, the load data must have priority over the up- 
dated Base value, if both registers are the same. 


To perform the X86 POP function, set: 
= SS, = OS, = indOS, = LSnop. 





The XPOPBR version of the instruction will read the linear address contained in the top of the 
call/return stack (LINEAR_RET). If LINEAR_RET is in the instruction cache then a branch is 
initiated to it and a hidden latch (RETSTK_HIT) is set. If LINEAR RET is not in the instruc- 
tion cache then RETSTK_HIT is cleared. See the description of the POP version of XJ for how 
RETSTK_HIT is used. 


Architecture Differences: XPOPBR instruction is new to C2. 
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Usage Note: The XPOPBR instruction is intended to speed up subroutine return. It is used to 


initiate a return to the address pushed on the call/return stack by a previous XBcc.PUSH. 
Whether this branch was in fact correct is verified by a subsequent XJ.POP. 





Performance 


Same rules as for the XL instruction. 


Operation 


if (XPOPBR) { // Subroutine return 


LINEAR_RET = RETSTK[TOS]; // Linear instruction pointer of expected return 
if (in_I_Cache(LINEAR_RET)) { // Expected return is in instruction cache 
start_branch(LINEAR_RET); // Get fetcher and translator going 


RETSTK_HIT = 1; // Latch to remember branch started 

} else { 
RETSTK_HIT = 0; // Indicate branch was not started 
} 





Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.8 XPUSH - X86 PUSH (Store with Pre-Update) 


Encoding 
31:26 25:21 ~=20:16 15:11 10:9 8 7:6 522 1 0 
XPUSH: M1111 Off Base SubOp | Addr Size-2:] Seg |Size-0] Adi | 
Sizel Size0 





Description 


Performs a store with X86 addressing semantics of the datainGPR —_and updates the base regis- 
ter with a new address. 


The effective address is calculated by adding the displacement specified in to the contents of 
GPR modulo the address size specified in . The linear address is calculated with re- 
spect to the segment descriptor specified in , The field specifies the number of bytes to 
store. The field specifies further special store semantic information. 


If the field indicates that four bytes are to be pushed and _is in the range 8 to 15 (a segment 
register) then the upper two bytes of the data pushed are zero. 


If the field is 16H (encoded as 8L) then it is the upper 2 bytes of GPR __ that are stored. In 
this case the store size is controlled by TSR.OS. If TSR.OS is 32 then the upper two bytes of the 
data pushed are zero. 


, the address size portion of GPR Base is re- 
placed by the effective address of the store. 


Usage Note: This instruction is intended to perform an X86 PUSH function in one clock. Note 
that the pre-update function correctly handles the PUSH SP function: the old SP value is stored 
before SP is updated. 


To perform the PUSH function, set: 
= SS, = OS, = indMOS, = LSnop. 





Performance 


Follows the same rules as for the XS instruction. 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.9 XPUSHIP - X86 PUSH NSIP (Store with Pre-Update) 
Encoding 
31:26 25:21 20:16 = 15:11 10:9 8 £62 52 1 0 
Sizel Size0 
6 5 5 5 2 1 2 4 1 1 
Description 
Performs a store with X86 addressing semantics of the NSIP register 
The effective address is calculated by adding the displacement specified in to the contents of 
GPR modulo the address size specified in . The linear address is calculated with re- 
spect to the segment descriptor specified in , The field specifies the number of bytes to 
store. The field specifies further special store semantic information. 


More information on the semantics of the function is found on page 2-7. 


The address size portion of GPR is updated to contain the effective address of the store; thus, 
this is a store with pre-update type instruction. 


Architecture Note: There is no general store of a CP2 data register instruction (which is where 
NSIP lives). Thus, a special form is provided for NSIP. 


Usage Note: This instruction is intended to perform a “push IP” function needed for fast CALL 
execution. To perform this operation set 


= 50; = OS, = indMOS, = LSnop. 





Performance 


Follows the same rules as for the XS instruction. 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.10XS - X86 Store 








Encoding 
31:26 25:21 = 20:16 15:11 10:9 8 7:6 522 1 0 
Sizel Size0 
6 5 5 5 2 1 2 4 1 1 
31:26 25:21 20:16 15:11 ‘10:98 8 7:6 522 1 0 
XS2: 111001,] Offset | Base RS SubOp | Addr |Size-2:1] Seg |Size-0| Addr 
Sizel Size0 
Description 
Performs a store with X86 addressing semantics of the data in GPR 
The effective address is calculated by adding the displacement specified in to the contents of 
GPR modulo the address size specified in . The linear address is calculated with re- 
spect to the segment descriptor specified in , The field specifies the number of bytes to load 
and the location within the target register. The field specifies further special store semantic 


information. The difference between XS and XS2 is merely the choice of data sizes and locations as 
defined in section 5.1.5. 
Performance 


If there is no data dependency stall, and the requested data is in the cache and contained within an 
eight-byte aligned unit, and there is no pending store operation. the store executes in one clock. 


If either base or RS are the result of a load or an ALU operation on the immediately preceding in- 
struction, there is an additional one-cycle data dependency stall. 


There is a store buffer between the D-stage and the cache/bus unit. Thus, stores that miss in the 
cache or that are blocked by previous loads or store operations take an indeterminate time. ‘The rules 
for store queuing are complex and are discussed elsewhere. 


If the data spans an eight-byte aligned unit, this instruction is automatically decomposed into two 
sequential stores to handle the two data portions. The timing of each follows the rules above except 
that the second load can’t cause an data dependency stall. 


If LOCK is in effect (specified on a previous load operation), timing gets more complicated de- 
pending on the compatibility mode in effect. 


Operation 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.11XSI, XPUSHI - X86 Store/PUSH Immediate 
Encoding 

31:26 25:21 20:16 15:11 10:9 8 7:6 5:2 1 0 
XSI: SubOp | Addr Sie] Seg Size-0 
Sizel Size0 

6 5 5 5 2 1 74 4 1 1 

31:26 25:21 20:16 15:11 10:9 8 7:6 5:2 1 0 
XPUSHI: |111010,|} Offset | Base SubOp | Addr |Size-2:1] Seg | Size-0 
Sizel Size0 





Description 


Performs a store or a PUSH with X86 addressing semantics of the value specified in the IMMED 
register. The semantics are the same as for the XS and XPUSH instructions except that the 
IMMED register is stored instead of the RS register. 


Architecture Note: There is no way to address the IMMED register as a source in a normal store 
instruction. Thus, a special form is provided for store IMMED. 


Usage Note: This instruction is intended to directly perform the corresponding X86 
MOV [ea],immed or PUSH 10H 


instructions in one clock: 


To perform this MOV operation set 
= AS, = OS, = LSnop. 





Operation 
Follows the same rules as for the XS and XPUSH instruction. 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.3.1.12XSU - X86 Store with Post-Update 





Encoding 

31:26 25:21 = =20:16 15:11 10:9 8 7:6 9:2 1 0 
XSU: 

6 5 5 R) 2 1 2 4 1 1 
Description 


Performs a store with X86 addressing semantics of the datainGPR —_ and updates the base regis- 
ter with a new address. 


The effective address is the value in GPR modulo the address size specified in . The 
linear address is calculated with respect to the segment descriptor specified in , The field 
specifies the number of bytes to load. The field specifies further special load semantic in- 
formation. 


, the address size portion of GPR is re- 
placed with the result of adding the to the contents of 


Performance 


Follows the same rules as for the XS instruction. 


Exceptions 
SEGERR, DPAGE, ALIGN, DBRKPT 
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3.4 CONTROL REGISTERS AND MICRO-OPERATIONS 


This section includes micro-operations that move to/from control registers that may be useful as 
Alternate Instructions. 


CP2 Control Registers 


ime Stamp Counter (Upper 32 bits) 
( its) 








Time Stamp Counter (Lower 32 b 
X86 CRO 
X86 EFLAGS 











3.4.1.1 CTC2 - Store To CP2 


Encoding 
31:26 25:21 20:16 15:11 10:8 7:5 4:0 
CTC2 100000 00000, | C2RD 000, | DPentl | 11001 
6 5 5 


5 5 3 3 


Description 
Moves the contents of general purpose register RS to the CP2 control register C2RD. 


DPcntl does not control the size of the result stored to the CP2 control register except for 
EFLAGS: in this case, a size of 16 or 32 bits can be specified directly or indirectly through OS. 
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3.4.1.2 CFC2 -Move Control From CP2 


Encoding 
31:26 29:21 20:16 15:11 10:6 


CFC2 XMISC 00000, a Pal C2RD CFC2 said 
101000 11111 


Description 
The contents of CP2 control register are loaded into GPR 
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3.4.1.3 XJ[XAI] - X86 Jump and optionally exit Alternate Instruction Execu- 
tion Mode 


Encoding 


31:26 29:21 20:16 = 15:11 


XJ 


: 000110, | 00000, 00000, fren 0001, oa 
XJXAI: 11 


XJSize 
sk A ee a 





address size-16 or 32 
operand size - 16 or 32 





Description 


XJ Performs an absolute branch with X86 addressing semantics. XJXAI will branch like XJ and 
also exit Alternate Instruction Execution Mode (encoded with bits 1:0 = ‘11’). The target address 
is the contents of the GPR modulo the current operand size (as opposed to address size as for 
load/store instructions). The linear address is calculated with respect to the CS segment descriptor. 
XJSize controls the size of the EA calculation. 


The NSIP is updated with the calculated target address. This has the effect of clearing the 
upper 16 bits of NSIP when an XJ instruction is executed with a 16 bit operand size. 


Exceptions 
SEGERR 
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CHAPTER 





X87 FLOATING-POINT MICRO- 
OPERATIONS (C5XL)} 


This chapter describes the x87 floating point registers and micro-operations available for use in 
Alternate Instruction enabled mode in the VIA C5XL (Nehemiah) processor. The encoding of 
x87 floating point micro-operations is different in earlier versions of the VIA C3 Processor family 
(C5A, C5B, C5C) and are not included in this document. 


4.1 X87 FLOATING POINT REGISTERS (C5XL 


All VIA C3 processors implement eight 80-bit registers corresponding to the standard x87 floating 
point registers. In addition to these eight registers the processor implements additional extended 
x87 floating-point registers, as described in the following table: 





CSXL (Nehemial 


Standard x87 floating- | Eight standard x87 float- 
point registers ing-point registers 


FPO-FP7 


Extended x87 float- Ten extended x87 float- 
ing-point registers ing-point registers 


FP8-FP17 
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4.2 X87 FLOATING-POINT MICRO-OPERATIONS (C5XL) 


X87 Floating-point micro-operations have certain fields in common to control the effects on the 
top-of-stack (TOS) field in FPSW and to control the precision, rounding, and response to excep- 
tion cases. 


Fmt2 Precision Controls 


migyt5:13] | Round |ask| pc | ct | Description 


000, FPCW FPCW | FPCW Per |Normal 
result 








oot, | reow | m2 | 6a | a |set reue.pe, clear px 


[aT rer [rec een faites c1_on stack fault only 











011, FPCW PE 64 Don’t 
clear 


Undefined 














TOSCtrl — Top Of Stack Control 


TOSGtni7:6 
00 
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4.2.1 FADD, FSUB, FSUBR, FMUL, FDIV, FDIVR 


FADD, FSUB, FSUBR, FMUL, FDIV, FDIVR — x87 Floating-point Add, Subtract, 
Encoding 
31:26 25:21 20:16 = 15:13 12:9 


Subtract Reverse, Multiply, Divide, Divide Reverse 
8 7:6 9:0 
010001, 0000, i TOSCtr | SubOp 
6 5 5 3 4 1 2 6 
Description 
The FADD, FSUB, SUBR, FMUL, FDIV and FDIVR instructions operate on the floating-point 
values in x87 floating-point register FRS and FRD and store the result in FRD. Note that 


FSUBR and FDIVKR are like FSUB and FDIV except the input operands (FRS and FRD) are re- 
versed (FRD is always the destination). 


FADD 
FSUB 
FSUBR 
FMUL 
FDIV 
FDIVR 





Fields 


Instruction encoding and operation 


SubOp 
FADD 000000, | FRD<- FRD+FRS 
FSUB 000001, | FRD<-FRD-FRS 


FRS - Source x87 FP register 


FRD - Source and Destination x87 FP 
register 


Fmt2 - Controls rounding, exception 
masking, precision, and flags 


R 
0-No record 
1 —- Record FP environment 


TOSCtrl - encode the function to be 
performed on the FPSW.TOP field 
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4.2.2 FSQRT, FABS, FCHS 


FSQRT, FABS, FCHS —- x87 Floating-point Square Root, Absolute Value, Change 
Sign 


Encoding 


31:26 29:21 20:16 15:13 12:9 


FSQRT 010001, Fmt2 | 0000, TOSCtrl re 
FABS 
FCHS 


Description 





The unary operations FSQRT, FABS and FCHS operate on the value in x87 floating-point regis- 
ter FRS and store the result in FRD. FABS and FCHS will copy FRS to FRD and set the sign bit 
of FRD to ‘0’ (FABS) or invert it (FCHS). 


Fields 


FRS - Source x87 FP register Instruction encoding and operation 


FRD - Destination x87 FP register SubOp 
Fmtz2 - Controls rounding, exception FSQRT 000100, | FRD <- SQRT(FRS) 


TaRSHag, prpetslOn antes FABS 000101, | FRD <- ABS(FRS) 


8 FCHS 000111, | FRD<- -(FRS) 


0 — No record 


1 - Record FP environment 


TOSCtrl - encode the function to be 
performed on the FPSW.TOP field 
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CHAPTER 


MMX MICRO-OPERATIONS 


This chapter describes the MMxX registers and MMX micro-operations available for use in Alter- 
nate Instruction execution mode. ‘There are micro-operations corresponding to all register-to- 
register x86 MMX< instructions. There are no alternate instructions for loading data directly from 
memory to an MMxX register. Also lacking are alternate instructions that combine a memory load 
with another operation. This is because these micro-operations require more than 32-bits and al- 
ternate instructions are restricted to 32-bits. Use x86 MMxX instructions to move data to/from 
memory. 
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5.1 MMX REGISTERS 


All VIA C3 processors implement eight 64-bit registers corresponding to the standard x86 MMX 
registers. In addition to these eight registers the processor implements additional extended MMX 
registers, as described in the following table: 


Samuel, Samuel 2, Ezra C5XL (Nehemiah) 


Standard X86 MMX_ | Eight standard x86 MMX | Eight standard x86 MMX 
registers registers registers 


MM0-MM7 MM0-MM7 


Extended MMX reg- | Two extended MMX reg- | Five extended MMX reg- 
isters isters isters: 


MM8-MM9 MM8-MM9 
MM13-MM15 
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5.2 MMX MICRO-OPERATIONS 


5.2.1 MMXADD/ MMXSUB 


Encoding 
31:26 25:21 20:16 
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10:8 7:6 5:4 3 2 


MMXADD | 010100, RS 000, | Sz | 01, S| T 
MMXSUB 





Description 
Implements the x86 


instructions with operands in two MMxX registers and 


destination in MMxX register. Note that the x86 instruction encoding requires that one of the 
source registers also be the destination register, this micro-operation allows the destination MMX 
register (RD) be different from the two source registers (RT,RS). 


Fields 


e Sz-Source /Dest Size 
00 - 8 bit 
01 - 16 bit 
10 - 32 bit 
RT - Source MMxX register 
RS - Source MMxX register 
RD - Destination MMX register 
K 
0 - Addition 
1 - Subtraction 


S - Signed 


0 - Unsigned 
1 - Signed 


T — Saturation 
0 - Wrap 


1 - Saturate 
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Equivalent x86 instruction ae 


PADDB 


PADDW 

PADDD 
‘raposs | 0 | 
PADDSW 

PADDUSB 


‘rapousw_| 0, | 


PSUBB 


PSUBW 
PSUBD 
PSUBSB 
PSUBSW 


psususs | 00, | 1 | 
jesususw | o, | 1 
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5.2.2 MMXPACK 


Encoding 
31:26 25:21 20:16 15:11 10:8 


MMXPACK Powe Tar Tos Teo Too, To Ts Ton, 


5 5 5 3 vs 3 1 2 


Description 


Implements the x86 instructions with operands in two 
MMxX registers and destination in MMX register. Note that the x86 instruction encoding requires 
that one of the source registers also be the destination register, this micro-operation allows the des- 
tination MMxX register (RD) be different from the two source registers (RT,RS). 


Fields 


Sz -Source /Dest Size Equivalent x86 instruction encoding 


00 -8bit 
01-16 bi 
RT — Source MMX register 
RS - Source MMX register | packuswe | 00, | 0. 


RD - Destination MMxX register 
S - Signed 

0 - Unsigned 

1 - Signed 
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5.2.3 MMXUNPACK 


Encoding 
31:26 25:21 20:16 15:11 10:8 7:6 5:3 2:1 0 
MMxuNPK [o10100, | Rt | Rs _| RD | 000, sz | 001, | 00, | H | 
6 


5 5 5 3 2 3 2 1 


Description 


Implements the x86 instructions with operands in two MMxX reg- 
isters and destination in MMxX register. Note that the x86 instruction encoding requires that one 
of the source registers also be the destination register, this micro-operation allows the destination 
MM<X register (RD) be different from the two source registers (RT,RS). 


Fields 


Sz -Source /Dest Size Equivalent x86 instruction encoding 


ol - 16 bit 
10 -82 it 
11-64 bit 
RT — Source MMxX register | punpckHpa | 11, | 1 | 


RS - Source MMxX register PUNPCKLBW oi, | 0 | 
RD - Destination MMxX register PUNPCKLWD loo) 
H - Source Half PUNPCKLDQ | 11, ro | 


0 - Low 
1 - High 
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9.2.4 LOGICALS 





Encoding 

31:26 25:21 =20:16 15:11 10:8 7:6 5:3 2:1 0 
= mer PEP ye 
MMXANDN 
MMXOR 
MMXXOR 

6 5 5 h) 3 2 3 2 1 
Description 


Implements the x86 instructions with operands in two 
MMxX registers and destination in MMX register. Note that the x86 instruction encoding requires 
that one of the source registers also be the destination register, this micro-operation allows the des- 
tination MMxX register (RD) be different from the two source registers (RT,RS). 


Fields 


L - Logical Operation Equivalent x86 instruction encoding 


00- AND 
01 - AND NOT PAND 
10- OR PANDN 


11 XOR jpor | 10, 
RT - Source MMxX register PXOR 11, 


RS - Source MMX register 
RD - Destination MMxX register 
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9.2.0 MOVES 


Encoding 
31:26 25:21 20:16 15:11 10:8 


Vv 





6 5 5 
Description 
Implements the x86 instructions with source operand in an MMX register and desti- 
nation in MMxX register. 
Fields 


¢ S-Replicate Low 32-bits in High Equivalent x86 instruction encoding 


0 High 32-bit from high 82-bit 
1 - High 32-bits copy of low 32-bits MOVQ 0 | 


e RS -Source MMX register 


e RD - Destination MMxX register 
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5.2.6 COMPARES 


Encoding 
31:26 29:21 20:16 15:11 10:8 7:6 5:3 2 = 1:0 
muxcmp | 10100, | rT | rs _| rp | 000, | sz | 101, || 00, 
6 


5 5 5 3 ra 3 hb 32 


Description 


Implements the x86 instructions with operands in two 
MMxX registers and destination in MMX register. Note that the x86 instruction encoding requires 
that one of the source registers also be the destination register, this micro-operation allows the des- 
tination MMxX register (RD) be different from the two source registers (RT,RS). 


Fields 


Sz -Source /Dest Size Equivalent x86 instruction encoding 


00 -8 bit 
01 16bit 
10-32 it jpompeaw | ot, | 1 
RT - Source MMX register 
RS - Source MMX register 


RD - Destination MMX register PCMPGTPW or, [0 | 
E - Compare Type PCMPGTPD 10, Fo | 


0 - Greater Than 
1 - Equal 





5-8 MMX Chapter 5 


VIA Confidential VIA C3 Alternate Instruction Set Programming Reference 
November 2002 


5.2.7 MULTIPLIES 


Encoding 


31:26 29:21 20:16 15:11 10:9 


MMXMULL 
MMXMULH 
MMXMULADD 





Description 


Implements the x86 instructions with operands in two MMxX reg- 
isters and destination in MMxX register. Note that the x86 instruction encoding requires that one 
of the source registers also be the destination register, this micro-operation allows the destination 
MMxX register (RD) be different from the two source registers (RT,RS). 


Fields 


M - Multiply Type Equivalent x86 instruction encoding 


00, - Low 
01, - High PMULLW 


10, - Multiply Add PMULHW 


RT - Source MMX register PMADDWD 


RS - Source MMxX register 


RD - Destination MMxX register 
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5.2.8 SHIFT 


Encoding 


31:26 25:21 =20:16 =15:11 10:38 7:6 5:3 2:1 0 





MMXSHL 
MMXSHR 
MMXSAR 
6 5 5 5 3 2 3 2 1 
Description 
Implements the x86 instructions with operands in two 


MMxX registers and destination in MMX register. Note that the x86 instruction encoding requires 
that one of the source registers also be the destination register, this micro-operation allows the des- 
tination MMxX register (RD) be different from the two source registers (RT,RS). 


Fields 


Sz —Source /Dest Size Equivalent x86 instruction encoding 
01-16 bit 
10 - 32 bit 
11 - 64 bit 

RT - MMxX register with shift count 

RS - Source MMX register 

RD - Destination MMxX register 


S — Shift type 
00 — Arithmetic Right 
01 - Right 
10 - Left 
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